141,329 research outputs found
Exact Decoding on Latent Variable Conditional Models is NP-Hard
Latent variable conditional models, including the latent conditional random
fields as a special case, are popular models for many natural language
processing and vision processing tasks. The computational complexity of the
exact decoding/inference in latent conditional random fields is unclear. In
this paper, we try to clarify the computational complexity of the exact
decoding. We analyze the complexity and demonstrate that it is an NP-hard
problem even on a sequential labeling setting. Furthermore, we propose the
latent-dynamic inference (LDI-Naive) method and its bounded version
(LDI-Bounded), which are able to perform exact-inference or
almost-exact-inference by using top- search and dynamic programming
Hybrid Oracle: Making Use of Ambiguity in Transition-based Chinese Dependency Parsing
In the training of transition-based dependency parsers, an oracle is used to
predict a transition sequence for a sentence and its gold tree. However, the
transition system may exhibit ambiguity, that is, there can be multiple correct
transition sequences that form the gold tree. We propose to make use of the
property in the training of neural dependency parsers, and present the Hybrid
Oracle. The new oracle gives all the correct transitions for a parsing state,
which are used in the cross entropy loss function to provide better supervisory
signal. It is also used to generate different transition sequences for a
sentence to better explore the training data and improve the generalization
ability of the parser. Evaluations show that the parsers trained using the
hybrid oracle outperform the parsers using the traditional oracle in Chinese
dependency parsing. We provide analysis from a linguistic view. The code is
available at https://github.com/lancopku/nndep
The largest singletons in weighted set partitions and its applications
Recently, Deutsch and Elizalde studied the largest and the smallest fixed
points of permutations. Motivated by their work, we consider the analogous
problems in weighted set partitions. Let denote the total
weight of partitions on with the largest singleton . In this
paper, explicit formulas for and many combinatorial
identities involving are obtained by umbral operators and
combinatorial methods. As applications, we investigate three special cases such
as permutations, involutions and labeled forests. Particularly in the
permutation case, we derive a surprising identity analogous to the Riordan
identity related to tree enumerations, namely, \begin{eqnarray*}
\sum_{k=0}^{n}\binom{n}{k}D_{k+1}(n+1)^{n-k} &=& n^{n+1}, \end{eqnarray*} where
is the -th derangement number or the number of permutations of
with no fixed points.Comment: 15page
A Generic Online Parallel Learning Framework for Large Margin Models
To speed up the training process, many existing systems use parallel
technology for online learning algorithms. However, most research mainly focus
on stochastic gradient descent (SGD) instead of other algorithms. We propose a
generic online parallel learning framework for large margin models, and also
analyze our framework on popular large margin algorithms, including MIRA and
Structured Perceptron. Our framework is lock-free and easy to implement on
existing systems. Experiments show that systems with our framework can gain
near linear speed up by increasing running threads, and with no loss in
accuracy
Lock-Free Parallel Perceptron for Graph-based Dependency Parsing
Dependency parsing is an important NLP task. A popular approach for
dependency parsing is structured perceptron. Still, graph-based dependency
parsing has the time complexity of , and it suffers from slow training.
To deal with this problem, we propose a parallel algorithm called parallel
perceptron. The parallel algorithm can make full use of a multi-core computer
which saves a lot of training time. Based on experiments we observe that
dependency parsing with parallel perceptron can achieve 8-fold faster training
speed than traditional structured perceptron methods when using 10 threads, and
with no loss at all in accuracy
A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification
Text summarization and text simplification are two major ways to simplify the
text for poor readers, including children, non-native speakers, and the
functionally illiterate. Text summarization is to produce a brief summary of
the main ideas of the text, while text simplification aims to reduce the
linguistic complexity of the text and retain the original meaning. Recently,
most approaches for text summarization and text simplification are based on the
sequence-to-sequence model, which achieves much success in many text generation
tasks. However, although the generated simplified texts are similar to source
texts literally, they have low semantic relevance. In this work, our goal is to
improve semantic relevance between source texts and simplified texts for text
summarization and text simplification. We introduce a Semantic Relevance Based
neural model to encourage high semantic similarity between texts and summaries.
In our model, the source text is represented by a gated attention encoder,
while the summary representation is produced by a decoder. Besides, the
similarity score between the representations is maximized during training. Our
experiments show that the proposed model outperforms the state-of-the-art
systems on two benchmark corpus
ON cyclotomic elements and cyclotomic subgroups in K_{2} of a field
The problem of expressing an element of K_2(F) in a more explicit form gives
rise to many works. To avoid a restrictive condition in a work of Tate, Browkin
considered cyclotomic elements as the candidate for the element with an
explicit form. In this paper, we modify and change Browkin's conjecture about
cyclotomic elements into more precise forms, in particular we introduce the
conception of cyclotomic subgroup. In the rational function field cases, we
determine completely the exact numbers of cyclotomic elements and cyclotomic
subgroups contained in a subgroup generated by finitely many different
cyclotomic elements, while in the number field cases, using Faltings' theorem
on Mordell conjecture we prove that there exist subgroups generated by an
infinite number of cyclotomic elements to the power of some prime, which
contain no nontrivial cyclotomic elements
Markov Chain Block Coordinate Descent
The method of block coordinate gradient descent (BCD) has been a powerful
method for large-scale optimization. This paper considers the BCD method that
successively updates a series of blocks selected according to a Markov chain.
This kind of block selection is neither i.i.d. random nor cyclic. On the other
hand, it is a natural choice for some applications in distributed optimization
and Markov decision process, where i.i.d. random and cyclic selections are
either infeasible or very expensive. By applying mixing-time properties of a
Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz
differentiable functions, which can be nonconvex. When the functions are convex
and strongly convex, we establish both sublinear and linear convergence rates,
respectively. We also present a method of Markov chain inertial BCD. Finally,
we discuss potential applications
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
Abbreviation is a common phenomenon across languages, especially in Chinese.
In most cases, if an expression can be abbreviated, its abbreviation is used
more often than its fully expanded forms, since people tend to convey
information in a most concise way. For various language processing tasks,
abbreviation is an obstacle to improving the performance, as the textual form
of an abbreviation does not express useful information, unless it's expanded to
the full form. Abbreviation prediction means associating the fully expanded
forms with their abbreviations. However, due to the deficiency in the
abbreviation corpora, such a task is limited in current studies, especially
considering general abbreviation prediction should also include those full form
expressions that do not have valid abbreviations, namely the negative full
forms (NFFs). Corpora incorporating negative full forms for general
abbreviation prediction are few in number. In order to promote the research in
this area, we build a dataset for general Chinese abbreviation prediction,
which needs a few preprocessing steps, and evaluate several different models on
the built dataset. The dataset is available at
https://github.com/lancopku/Chinese-abbreviation-datase
Learning Sentiment Memories for Sentiment Modification without Parallel Data
The task of sentiment modification requires reversing the sentiment of the
input and preserving the sentiment-independent content. However, aligned
sentences with the same content but different sentiments are usually
unavailable. Due to the lack of such parallel data, it is hard to extract
sentiment independent content and reverse the sentiment in an unsupervised way.
Previous work usually can not reconcile sentiment transformation and content
preservation. In this paper, motivated by the fact the non-emotional context
(e.g., "staff") provides strong cues for the occurrence of emotional words
(e.g., "friendly"), we propose a novel method that automatically extracts
appropriate sentiment information from learned sentiment memories according to
specific context. Experiments show that our method substantially improves the
content preservation degree and achieves the state-of-the-art performance.Comment: Accepted by EMNLP 201
- β¦